Introduction

Note: I have 197 instead of 200 rows in my data due to encountering an error in the data collection script.

I chose the channels Kanae and Kuzuha, who are both Virtual YouTubers under the talent agency Nijisanji. Kanae and Kuzuha make up a popular unit called ChroNoiR, along with individually being a couple of the most popular male Virtual YouTubers, so I thought it would interesting to see how their video metrics compare to each other.

Before I accessed the data for visualisations, I considered comparing their views per video or likes per video over time, and their most commonly used tags. This would have let us see how their videos’ popularity has changed over their careers, and what their videos are most often about. I thought there might be higher growth in views or likes for Kuzuha, and that they probably most often make videos about similar games.

In the end, since Kanae has like counts disabled, I ended up looking at comparing comment counts or view counts over time for my first plot. I went with view counts. View counts are a better measure of engagement, in my opinion - comment counts can be unreliable, since sometimes comments are turned off, or the videos are archives of livestreams where most of the comments were in the live chat.

The trickier aspect ended up being time: it felt extremely uninformative to simply plot view count over time, because it was all a mass of small blobs at the bottom of the y-axis with a few more popular videos sticking out. After some experimenting, I decided to group time by months and plot the mean view count for each month, in hopes that looking at those would make any patterns more obvious than the somewhat indistinguishable individual view counts.

Then, I looked for something I could do with a categorical variable. Tags were not a variable which was collected, and I noticed when I opened some videos that the games weren’t usually tagged anyways. So, looking for inspiration from the lab tasks and lecture slides, I thought it could be interesting to see which of their videos got the most popular. It would flow well in the data story, too - from showing the views of all their videos, to showing what the most viewed videos are. I used geom_col to make a bar graph, and ran into the problem of the titles being extremely long. My solution was to use images as the labels with the ggtext package and to put the titles on the bars instead with geom_text, both to solve the problem and for extra visual interest.

Finally, I felt like a fitting final plot could be a summary of Kanae’s and Kuzuha’s overall video metrics compared to each other. The ones I had available were duration, views, and comment count. I thought to compare means at first, but I couldn’t figure out how to make a meaningful and suitably detailed visualisation (for example, bar charts didn’t feel fitting because I wasn’t counting anything, I was just trying to display a point average; and simply displaying two points on a plot felt very awkward). So I figured a geom_jitter and geom_boxplot could work instead, as a more informative way to compare metrics between the channels. The boxplots aren’t very visible for views or comments, being squished at the left side of the graph, but I suppose that’s informative in its own way.

Dynamic data story

Data story
Data story

For creativity, I’ve used the extra packages patchwork and ggtext to build on the code examples from this class. patchwork let me combine related graphs for easier comparisons in a single larger graph, while ggtext let me use images as my axis labels. I’ve also used the image_blur function from the magick package and another slide for each plot to make brief comments on the plots, to add some context and thoughts. I’ve tried to adjust the frame counts for a more pleasant viewing/reading experience as well. Overall, I’ve tried to tell a cohesive story about Kanae’s and Kuzuha’s career successes over the years.

Learning reflection

I learned a lot about the important idea of using ggplot2 and the grammar of graphics to make more complex plots, with multiple geoms and custom themes. I also learned more about how to manipulate data to make the plots we want to see, such as making a top n ranking by some variable. It was interesting to see how different data frames could then be used for different layers in the same plot.

I still feel like I could and want to learn a lot more about ggplot2. I’d like to make my plots look more pleasant, to choose better color palettes, and to have better control over the elements of my plot. It would also be great to keep seeing different plots, so that I can build up a mental library of what kind of data visualisations are possible.

Appendix

library(tidyverse)
library(patchwork)
library(ggtext)


youtube_data <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQyY2RSB9T_ULUN0VoOZDvd9_SMROVtH7JgJ1_94AL04DJM8FcSsN_TdjKHV-pNBWomyl6YkfcFVau4/pub?gid=0&single=true&output=csv")

palette <- c("#cadff4", "#1167ad", "#31476d", "#91679d", "#4781fd", "#ad4444")
ggplot_theme <- theme(
                      plot.background = element_rect(fill = palette[1]),
                      plot.margin = unit(c(0.1, 0.3, 0.1, 0.3), 'in'),
                      panel.background = element_rect(fill = palette[1]),
                      panel.border = element_rect(color = palette[2], fill = NA),
                      panel.grid = element_line(color = palette[2]),
                      panel.grid.major.y = element_line(linewidth = 0.3),
                      panel.grid.major.x = element_blank(),
                      panel.grid.minor.x = element_blank(),
                      legend.background = element_rect(fill = palette[1]),
                      axis.ticks = element_line(color = palette[2], linewidth = 0.3),
                      text = element_text(color = palette[3]),
                      axis.text = element_text(color = palette[2])
)

# Plot 1: views and comments over time
youtube_data <- youtube_data %>%
  mutate(month_year = paste(str_sub(datePublished, 3, 4), str_sub(datePublished, 6, 7), sep = "/"),
         channelName = str_sub(channelName, start = 2))

youtube_data_means <- youtube_data %>%
  group_by(month_year, channelName) %>%
  summarise(mean_views = mean(viewCount)) %>%
  ungroup()

youtube_data_over_7mil <- youtube_data %>%
  filter(viewCount >= 7000000)

plot1 <- ggplot() +
  geom_point(data = youtube_data,
             aes(x = month_year, y = viewCount, color = channelName),
             alpha = 0.4,
             size = 5,
             shape = 4) +
  geom_point(data = youtube_data_means,
             aes(x = month_year, y = mean_views, color = channelName),
             shape = 18,
             size = 3) +
  geom_text(data = youtube_data_over_7mil,
            aes(x = month_year, y = viewCount, color = channelName, label = viewCount),
            nudge_x = 2.5) +
  scale_color_manual(values = c(palette[5], palette[6])) +
  guides(color = "none") +
  labs(title = str_wrap("Kanae and Kuzuha have gotten a reasonably consistent number of views on their videos over time, with some stand-out more popular videos."),
       x = "Year/month",
       y = "View count",
       subtitle = "Blue: Kanae. Red: Kuzuha.\nDiamonds: average views that month. Labels: number of views. Purple line: 1,000,000 views.") +
  scale_y_continuous(labels = scales::label_number()) +
  geom_hline(aes(yintercept = 1000000),
             color = palette[4]) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
        plot.subtitle = element_markdown()) +
  ggplot_theme

ggsave("plot1.png", plot1, width = 12, height = 4)

# Plot 2: top 5 videos by views from each channel
kanae_top_videos <- youtube_data %>%
  filter(channelName == "Kanae") %>%
  arrange(desc(viewCount)) %>%
  slice(1:5) %>%
  arrange(viewCount)

kuzuha_top_videos <- youtube_data %>%
  filter(channelName == "Kuzuha") %>%
  arrange(desc(viewCount)) %>%
  slice(1:5) %>%
  arrange(viewCount)

plot2a <- kanae_top_videos %>%
  ggplot(aes(y = reorder(title, viewCount), x = viewCount)) +
  geom_col(fill = palette[5], alpha = 0.3, width = 0.5) +
  scale_y_discrete(labels = paste0("<img src = '", kanae_top_videos$thumbnailUrl, "' height = '40'/>")) +
  theme(axis.text.y = element_markdown()) +
  geom_text(aes(label = str_wrap(title, 39, whitespace_only = FALSE)),
            hjust = 1,
            color = palette[2]) +
  labs(subtitle = "Kanae",
       x = "Number of views",
       y = NULL) +
  scale_x_continuous(labels = scales::label_number())

plot2b <- kuzuha_top_videos %>%
  ggplot(aes(y = reorder(title, viewCount), x = viewCount)) +
  geom_col(fill = palette[6], alpha = 0.3, width = 0.5) +
  scale_y_discrete(labels = paste0("<img src = '", kuzuha_top_videos$thumbnailUrl, "' height = '40'/>")) +
  theme(axis.text.y = element_markdown()) +
  geom_text(aes(label = str_wrap(title, 28, whitespace_only = FALSE)),
            hjust = 1,
            color = palette[2]) +
  labs(subtitle = "Kuzuha",
       x = "Number of views",
       y = NULL) +
  scale_x_continuous(labels = scales::label_number())

plot2 <- plot2a + plot2b +
  plot_annotation(title = "Out of those, their most viewed videos are their song covers.") &
  ggplot_theme

ggsave("plot2.png", plot2, width = 12, height = 4)

# Plot 3: overall boxplot comparisons for each channel

plot3_a <- youtube_data %>%
  ggplot(aes(y = channelName, x = duration)) +
  geom_jitter(aes(color = channelName), alpha = 0.3) +
  geom_boxplot(aes(color = channelName, fill = channelName),
               alpha = 0.3) +
  scale_x_continuous(labels = scales::label_number()) +
  scale_color_manual(values = c(palette[5], palette[6])) +
  scale_fill_manual(values = c(palette[5], palette[6])) +
  guides(color = "none", fill = "none") +
  labs(y = "Channel", x = "Duration of video (seconds)")

plot3_b <- youtube_data %>%
  ggplot(aes(y = channelName, x = viewCount)) +
  geom_jitter(aes(color = channelName), alpha = 0.3) +
  geom_boxplot(aes(color = channelName, fill = channelName),
               alpha = 0.3) +
  scale_x_continuous(labels = scales::label_number()) +
  scale_color_manual(values = c(palette[5], palette[6])) +
  scale_fill_manual(values = c(palette[5], palette[6])) +
  guides(color = "none", fill = "none") +
  labs(y = "Channel", x = "Number of views")

plot3_c <- youtube_data %>%
  ggplot(aes(y = channelName, x = commentCount)) +
  geom_jitter(aes(color = channelName), alpha = 0.3) +
  geom_boxplot(aes(color = channelName, fill = channelName),
               alpha = 0.3) +
  scale_x_continuous(labels = scales::label_number()) +
  scale_color_manual(values = c(palette[5], palette[6])) +
  scale_fill_manual(values = c(palette[5], palette[6])) +
  guides(color = "none", fill = "none") +
  labs(y = "Channel", x = "Number of comments")

plot3 <- plot3_a / plot3_b / plot3_c +
  plot_annotation(title = str_wrap("Overall, they tend to have similar video durations, views, and comments, with Kuzuha's being somewhat higher.")) &
  ggplot_theme

ggsave("plot3.png", plot3, width = 12, height = 4)
library(magick)
library(tidyverse)

palette <- c("#cadff4", "#1167ad", "#31476d", "#91679d", "#4781fd", "#ad4444")

blank_slide <- image_blank(width = 1200,
                           height = 400,
                           color = palette[1])

title <- blank_slide %>%
  image_annotate("Chronoir's streaming careers",
                 gravity = "center",
                 size = "56",
                 font = "Noto Sans",
                 location ="+0-40",
                 color = palette[3]) %>%
  image_annotate("Examining some random videos of Kanae's and Kuzuha's",
                 style = "italic",
                 gravity = "center",
                 size = "36",
                 location = "+0+20",
                 color = palette[2])

plot1 <- image_read("plot1.png") %>%
  image_scale("1200x400")
plot1_comments <- plot1 %>%
  image_blur(radius = 20, sigma = 3) %>%
  image_annotate(str_wrap("It looks like Kuzuha especially has some videos which went quite viral. For both VTubers, older viral videos seem to have had more time to accumulate views than newer ones.", 60),
                 gravity = "center",
                 size = "30",
                 color = palette[2],
                 boxcolor = palette[1],
                 font = "Noto Sans")

plot2 <- image_read("plot2.png") %>%
  image_scale("1200x400")
plot2_comments <- plot2 %>%
  image_blur(radius = 20, sigma = 3) %>%
  image_annotate(str_wrap("This sample doesn't include all their videos; Kuzuha's cover of KING by Kanaria is notorious for its (at the time of writing) 47 million views. But it makes sense that their song covers would be watched and rewatched by a bigger audience than their gaming streams.", 60),
                 gravity = "center",
                 size = "30",
                 color = palette[2],
                 boxcolor = palette[1],
                 font = "Noto Sans")

plot3 <- image_read("plot3.png") %>%
  image_scale("1200x400")
plot3_comments <- plot3 %>%
  image_blur(radius = 20, sigma = 3) %>%
  image_annotate(str_wrap("Both VTubers post streams which are hours long, though Kanae streams for somewhat fewer hours. While Kuzuha usually gets a higher amount and bigger range of views, Kanae's and Kuzuha's usual amounts of comments don't differ much.", 60),
                 gravity = "center",
                 size = "30",
                 color = palette[2],
                 boxcolor = palette[1],
                 font = "Noto Sans")

conclusion <- blank_slide %>%
  image_annotate(str_wrap("As ChroNoiR approach their 6th anniversary as a unit, Kanae and Kuzuha are still doing consistently well in their solo careers.", 70),
                 gravity = "northwest",
                 size = "30",
                 location = "+10+10",
                 color = palette[2],
                 font = "Noto Sans") %>%
  image_annotate(str_wrap("Kuzuha especially has videos which often become hits even outside of his regular viewers.", 70),
                 gravity = "east",
                 size = "30",
                 location = "+10-30",
                 color = palette[2],
                 font = "Noto Sans") %>%
  image_annotate(str_wrap("Despite mainly being gamers, with hours-long game streams, Kanae's and Kuzuha's music videos in particular do quite well. It speaks to their overall successes as well-rounded entertainers that they've maintained large and varied fanbases for all these years.", 70),
                 location = "+0+10",
                 gravity = "south",
                 size = "30",
                 color = palette[3],
                 font = "Noto Sans")

frames <- c(rep(title, 5),
            rep(plot1, 10),
            rep(plot1_comments, 9),
            rep(plot2, 12),
            rep(plot2_comments, 9),
            rep(plot3, 10),
            rep(plot3_comments, 9),
            rep(conclusion, 12))

data_story <- image_animate(frames, fps = 1)

image_write(data_story, "data_story.gif")